Improving Direct Counting for Frequent Itemset Mining
نویسندگان
چکیده
During the last ten years, many algorithms have been proposed to mine frequent itemsets. In order to fairly evaluate their behavior, the IEEE/ICDM Workshop on Frequent Itemset Mining Implementations (FIMI’03) has been recently organized. According to its analysis, kDCI++ is a state-of-the-art algorithm. However, it can be observed from the FIMI’03 experiments that its efficient behavior does not occur for low minimum supports, specially on sparse databases. Aiming at improving kDCI++ and making it even more competitive, we present the kDCI-3 algorithm. This proposal directly accesses candidates not only in the first iterations but specially in the third one, which represents, in general, the highest computational cost of kDCI++ for low minimum supports. Results have shown that kDCI-3 outperforms kDCI++ in the conducted experiments. When compared to other important algorithms, kDCI-3 enlarged the number of times kDCI++ presented the best behavior.
منابع مشابه
Concurrent Processing of Frequent Itemset Queries Using FP-Growth Algorithm
Discovery of frequent itemsets is a very important data mining problem with numerous applications. Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. A significant amount of research on frequent itemset mining has been done so far, focusing mainly on developing faster complete mining al...
متن کاملBISC: a Binary Itemset Support Counting Approach towards Efficient Frequent Itemset Mining
the performance of a depth-first Frequent Itemset Miming (FIM) algorithm is closely related to the total number of recursions which can be modeled as O(n), where k is the maximal recursion depth and n is the branching factor. Many existing approaches focus more on improving support counting rather than on decreasing n and k, which may lead to unsatisfactory performance as they grow. In this pap...
متن کاملTR-2009001: BISC: A Binary Itemset Support Counting Approach towards Efficient Frequent Itemset Mining
the performance of a depth-first Frequent Itemset Miming (FIM) algorithm is closely related to the total number of recursions which can be modeled as O(n), where k is the maximal recursion depth and n is the branching factor. Many existing approaches focus more on improving support counting rather than on decreasing n and k, which may lead to unsatisfactory performance as they grow. In this pap...
متن کاملThree Strategies for Concurrent Processing of Frequent Itemset Queries Using FP-Growth
Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. Recently, a new problem of optimizing processing of sets of frequent itemset queries has been considered and two multiple query optimization techniques for frequent itemset queries: Mine Merge and Common Counting have been proposed and ...
متن کاملEfficient Maximal Frequent Itemset Mining by Pattern - Aware Dynamic Scheduling
While frequent pattern mining is fundamental for many data mining tasks, mining maximal frequent itemsets efficiently is important in both theory and applications of frequent itemset mining. The fundamental challenge is how to search a large space of item combinations. Most of the existing methods search an enumeration tree of item combinations in a depthfirst manner. In this thesis, we develop...
متن کامل